Karush-Kuhn-Tucker Condition
We consider the following problem:
minimize subject to f(x)h(x)=0,g(x)≤0, where f:Rn→R,h:Rn→Rm,m≤n, and g:Rn→Rp. For the general problem above, we adopt the following definitions.
Definition
An inequality constraint gj(x)≤0 is said to be active at x∗ if gj(x∗)=0. It is inactive at x∗ if gj(x∗)<0.
By convention, we consider an equality constraint hi(x)=0 to be always active.
Definition
Let x∗ satisfy h(x∗)=0,g(x∗)≤0, and let J(x∗) be the index set of active inequality constraints:
J(x∗)≜{j:gj(x∗)=0} Then, we say that x∗ is a regular point if the vectors
∇hi(x∗),∇gj(x∗),1≤i≤m,j∈J(x∗) are linearly independent.
We now prove a first-order necessary condition for a point to be a local minimizer. We call this condition the Karush-Kuhn-Tucker (KKT) condition. In the literature, this condition is sometimes also called the Kuhn-Tucker condition.
Karush-Kuhn-Tucker (KKT) Theorem
Let f, h,g∈ C1. Let x∗ be a regular point and a local minimizer for the problem of minimizing f subject to h(x)=0,g(x)≤0. Then, there exist λ∗∈Rm and μ∗∈Rp such that:
- μ∗≥0
- Df(x∗)+λ∗⊤Dh(x∗)+μ∗⊤Dg(x∗)=0⊤.
- μ∗⊤g(x∗)=0.
In this theorem, we refer to λ∗ as the Lagrange multiplier vector and μ∗ as the Karush-Kuhn-Tucker (KKT) multiplier vector. We refer to their components as Lagrange multipliers and Karush-Kuhn-Tucker (KKT) multipliers, respectively.
Before proving this theorem, let us first discuss its meaning. Observe that μj∗≥0 (by condition 1 ) and gj(x∗)≤0. Therefore, the condition
μ∗⊤g(x∗)=μ1∗g1(x∗)+⋯+μp∗gp(x∗)=0 implies that if gj(x∗)<0, then μj∗=0; that is, for all j∈/J(x∗), we have μj∗=0. In other words, the KKT multipliers μj∗ corresponding to inactive constraints are zero. The other KKT multipliers, μi∗,i∈J(x∗), are nonnegative; they may or may not be equal to zero.
Proof
Let x∗ be a regular local minimizer of f on the set {x:h(x)=0,g(x)≤0}. Then, x∗ is also a regular local minimizer of f on the set {x:h(x)=0,gj(x)=0,j∈J(x∗)} (see Exercise 21.16). Note that the latter constraint set involves only equality constraints. Therefore, from Lagrange's theorem, it follows that there exist vectors λ∗∈Rm and μ∗∈Rp such that
Df(x∗)+λ∗⊤Dh(x∗)+μ∗⊤Dg(x∗)=0⊤, where for all j∈/J(x∗), we have μj∗=0. To complete the proof it remains to show that for all j∈J(x∗), we have μj∗≥0 (and hence for all j=1,…,p, we have μj∗≥0, i.e., μ∗≥0 ). We use a proof by contradiction. So suppose that there exists j∈J(x∗) such that μj∗<0. Let S^ and T^(x∗) be the surface and tangent space defined by all other active constraints at x∗. Specifically,
S^={x:h(x)=0,gi(x)=0,i∈J(x∗),i=j} and
T^(x∗)={y:Dh(x∗)y=0,Dgi(x∗)y=0,i∈J(x∗),i=j}. We claim that by the regularity of x∗, there exists y∈T^(x∗) such that
Dgj(x∗)y=0. To see this, suppose that for all y∈T^(x∗),∇gj(x∗)⊤y=Dgj(x∗)y=0 This implies that ∇gj(x∗)∈T^(x∗)⊥. By Lemma 20.1, this, in turn, implies that
∇gj(x∗)∈span[∇hk(x∗),k=1,…,m,∇gi(x∗),i∈J(x∗),i=j] But this contradicts the fact that x∗ is a regular point, which proves our claim. Without loss of generality, we assume that we have y such that Dgj(x∗)y<0 Consider the Lagrange condition, rewritten as
Df(x∗)+λ∗⊤Dh(x∗)+μj∗Dgj(x∗)+i=j∑μi∗Dgi(x∗)=0⊤. If we postmultiply the above by y and use the fact that y∈T^(x∗), we get
Df(x∗)y=−μj∗Dgj(x∗)y. Because Dgj(x∗)y<0 and we have assumed that μj∗<0, we have
Df(x∗)y<0. Because y∈T^(x∗), by Theorem 20.1 we can find a differentiable curve {x(t):t∈(a,b)} on S such that there exists t∗∈(a,b) with x(t∗)=x∗ and x˙(t∗)=y. Now,
dtdf(x(t∗))=Df(x∗)y<0. The above means that there is a δ>0 such that for all t∈(t∗,t∗+δ], we have
f(x(t))<f(x(t∗))=f(x∗). On the other hand,
dtdgj(x(t∗))=Dgj(x∗)y<0, and for some ε>0 and all t∈[t∗,t∗+ε], we have that gj(x(t))≤0. Therefore, for all t∈(t∗,t∗+min{δ,ε}], we have that gj(x(t))≤0 and f(x(t))<f(x∗). Because the points x(t),t∈(t∗,t∗+min{δ,ε}], are in S^, they are feasible points with lower objective function values than x∗. This contradicts the assumption that x∗ is a local minimizer, which completes the proof.